Objective and Comprehensive Evaluation of Bisulfite Short Read Mapping Tools

نویسندگان

  • Hong Tran
  • Jacob Porter
  • Ming-an Sun
  • Hehuang Xie
  • Liqing Zhang
چکیده

Background. Large-scale bisulfite treatment and short reads sequencing technology allow comprehensive estimation of methylation states of Cs in the genomes of different tissues, cell types, and developmental stages. Accurate characterization of DNA methylation is essential for understanding genotype phenotype association, gene and environment interaction, diseases, and cancer. Aligning bisulfite short reads to a reference genome has been a challenging task. We compared five bisulfite short read mapping tools, BSMAP, Bismark, BS-Seeker, BiSS, and BRAT-BW, representing two classes of mapping algorithms (hash table and suffix/prefix tries). We examined their mapping efficiency (i.e., the percentage of reads that can be mapped to the genomes), usability, running time, and effects of changing default parameter settings using both real and simulated reads. We also investigated how preprocessing data might affect mapping efficiency. Conclusion. Among the five programs compared, in terms of mapping efficiency, Bismark performs the best on the real data, followed by BiSS, BSMAP, and finally BRAT-BW and BS-Seeker with very similar performance. If CPU time is not a constraint, Bismark is a good choice of program for mapping bisulfite treated short reads. Data quality impacts a great deal mapping efficiency. Although increasing the number of mismatches allowed can increase mapping efficiency, it not only significantly slows down the program, but also runs the risk of having increased false positives. Therefore, users should carefully set the related parameters depending on the quality of their sequencing data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Updates to the RMAP short-read mapping software

SUMMARY We report on a major new version of the RMAP software for mapping reads from short-read sequencing technology. General improvements to accuracy and space requirements are included, along with novel functionality. Included in the RMAP software package are tools for mapping paired-end reads, mapping using more sophisticated use of quality scores, collecting ambiguous mapping locations and...

متن کامل

Algorithms and tools for the analysis of high throughput DNA sequencing data

High-throughput DNA sequencing technologies make it possible to determine the order of the nucleotides adenine, cytosine, guanine and thymine in DNA samples, resulting in millions of short strings (reads) over the alphabet (A, C, G, T). Advances in biological and biomedical research rely on the ability of bioinformatics to make sense out of that data with novel algorithms and tools. In this the...

متن کامل

SVM2: an improved paired-end-based tool for the detection of small genomic structural variations using high-throughput single-genome resequencing data

Several bioinformatics methods have been proposed for the detection and characterization of genomic structural variation (SV) from ultra high-throughput genome resequencing data. Recent surveys show that comprehensive detection of SV events of different types between an individual resequenced genome and a reference sequence is best achieved through the combination of methods based on different ...

متن کامل

A Segemehl Manual (version 0.1.7; Rev 1)

segemehl is a software to map short sequencer reads to reference genomes. Unlike other methods, segemehl is able to detect not only mismatches but also insertions and deletions. Furthermore, segemehl is not limited to a specific read length and is able to map primeror polyadenylation contaminated reads correctly. segemehl implements a matching strategy based on enhanced suffix arrays (ESA). seg...

متن کامل

meRanTK: methylated RNA analysis ToolKit

UNLABELLED The significance and function of posttranscriptional cytosine methylation in poly(A)RNA attracts great interest but is still poorly understood. High-throughput sequencing of RNA treated with bisulfite (RNA-BSseq) or subjected to enrichment techniques like Aza-IP or miCLIP enables transcriptome wide studies of this particular modification at single base pair resolution. However, to da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2014  شماره 

صفحات  -

تاریخ انتشار 2014